Bounded fitness landscapes and the evolution 
of the linguistic diversity 

Viviane M. de Oliveira"* Paulo R. A. Campos^ M. A. F. Gomes", 

I. R. Tsang^ 

February 2, 2008 



"Departamento de Fi'sica, Universidade Federal de Pernambuco, 50670-901, 
Recife, PE, Brazil 

''Departamento de Fisica e Matematica, Universidade Federal Rural de Per- 
nambuco 52171-900, Dois Irmaos, Recife-PE, Brazil 

'^Centre de Informatica, Universidade Federal de Pernambuco, 50670-901, 
Recife, PE, Brazil 



Abstract 

A simple spatial computer simulation model was recently intro- 
duced to study the evolution of the linguistic diversity ^ . The model 
considers processes of selective geographic colonization, linguistic 
anomalous diffusion and mutation. In the approach, we ascribe to 
each language a fitness function which depends on the number of peo- 
ple that speak that language. Here we extend the aforementioned 
model to examine the role of saturation of the fitness on the language 
dynamics. We found that the dependence of the linguistic diversity 
on the area after colonization displays a power law regime with a non- 
trivial exponent in very good agreement with the measured exponent 
associated with the actual distribution of languages on the Earth. 
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1 Introduction 



The research in language dynamics has arose an increasing interest of the 
complex systems community in the last years. Most of the researchers focus 
their investigations on issues like rise, competition, extinction risk and death 
of languages [SEllllEllEllZlElini HHl CI]- Furthermore, recent advances 
in archeology, genetics and linguistics have provided relevant contributions 
to a better comprehension of the linguistic diversification ^21 E]- Some 
investigations have demonstrated that distinct causes have greatly affected 
the evolution of the linguistic diversity. Among the main elements are geo- 
graphic factors, economic features, complexity of the language, to cite just 
a few. For instance, Sutherland PJ has shown that beside country area, for- 
est area and maximum altitude contribute to increase diversity, whereas the 
diversity decreases for a larger latitude. According to Bellwood |T31 [T3] and 
Renfrew [T7j the occurrence of agricultural expansion was the responsible 
for the massive population replacements initiated about 10,000 years ago and 
caused the disappearance of many of the Old World languages. 

In a recent work, we investigated the evolution of the linguistic diver- 
sity by introducing a spatial computer simulation model that considers a 
diffusive process which is able to generate and sustain the diversity pp. The 
model describes the occupation of a given area by populations speaking sev- 
eral languages. To each language was assigned a fitness value / which is 
proportional to the number of sites colonized by populations that speak that 
language. In the process of colonization, language mutation or differentia- 
tion and language substitution can take place, which affords the linguistic 
diversity. This simple model gives rise to scaling laws in close resemblance 
with those reported in [TH]. 

In the current contribution, we study the dynamics of the linguistic di- 
versity but now we assume that the fitness of each language is bounded by a 
given maximum (saturation) value which is randomly chosen from an uniform 
distribution. The saturation hypothesis mimics factors like the difficulty/ease 
of learning the languages and economy that permit some languages to prop- 
agate more easily than others. 

The paper is organized as follows. In Section 2 we introduce the model. 
In Section 3 we discuss the results. And finally, in Section 4 we present the 
conclusions. 
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2 Model 



Our model is defined on a two-dimensional lattice of linear size L, and com- 
posed of A = Lx L sites with periodic boundary conditions. Each lattice site 
Si represents a given region, which can be occupied by a single population 
speaking just one language. We ascribe to each site a given capability Q, 
whose value we estimate from a uniform distribution, defined in the interval 
0-1. The capability means the amount of resources available to the popula- 
tion which will colonize that place. It is implicit that the population size in 
each cell Sj is proportional to its capability Q. 

In the first step of the dynamics, we randomly choose one site of the lattice 
to be colonized by a single population that speaks the ancestor language. 
Each language is labeled by an integer number. As soon as a new language 
arises, it is labeled by the next upper integer. To each language, we assign a 
fitness value /, which is calculated as the sum of the capabilities of the sites 
which speak that specific language. But now differently from reference pQ, 
the fitness can not exceed an integer value jk which we have chosen to be 
in the range 1-2000. This saturation term 7^ is randomly chosen when the 
language k appears. Thus, the initial fitness of the ancestor language is the 
capability of the initial site. 

In the second step, one of the four nearest neighbors of the site containing 
the ancestor language will be chosen to be colonized with probability pro- 
portional to its capability. We assume that regions containing larger amount 
of resources are most likely to be colonized faster than poor regions. The re- 
ferred site is then occupied by a population speaking the ancestor language 
or a mutant version of it. Mutations are the mechanisms responsible for 
generating diversity, and together with the natural selection mantains the 
standing level of diversity on the system. The probability of occurrence of 
a mutation in the process of propagation is p = j, where a is a constant, 
and so the mutation probability is inversely proportional to the fitness of the 
language. The form of the mutation probability p is inspired by population 
genetics, where the most adapted organisms are less likely to mutate than 
poorly adapted organisms |19j. The probability of producing reverse muta- 
tions is zero, that is, the language generated by a mutation is always different 
of the previous ones. 

In the subsequent steps, we check the empty sites which are located on 
the boundary of the colonized cluster, and we then choose one of those empty 
sites according to their capabilities. Again, those sites with higher capabili- 
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Figure 1: Snapshot of a typical realization of the dynamics at the first mo- 
ment of colonization of all sites. The saturation quantities 7^ are randomly 
chosen in the interval 1-2000. The lattice size is L = 500 and a. — 0.3. See 
text for detail. 
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ties enjoy of a greater likelihood to be occupied. After that, we choose the 
language to be incorporated in the chosen cell among those languages oc- 
cupying the neighboring sites. Languages with higher fitness have higher 
chance to expand. The process continues while there are empty sites in the 
network. After completion, we count the total number of languages D. In 
order to give to the reader some insight about our model, in Figure 1 we 
present the snapshot for a typical realization of the dynamics at the first 
moment of colonization of all sites (in this figure the gray scale represents 
different languages). The striated linguistic domains presenting very small 
territories occupied by different languages shown in Figure 1 remind us the 
actual distribution of languages observed in the Caucasus region between 
Black and Caspian Seas, a relatively small area of 300,000 km^ where lan- 
guages of the Caucasic, Indo-European and Altaic families coexist distributed 
within a large variety of peoples [13\ . 



3 Results and Discussion 

In Figure 2, we show the diversity D as a function of the area A (total number 
of sites in the lattice) for mutation parameter a = 0.3 and saturation values 
defined in the interval 1-2000. The points are averages over 100 independent 
simulations when L < 400 and over 20 simulations when L = 500. We 
observe that the curve presents just one scaling region which extends over 
five decades. The exponent z = 0.39 ±0.01 is in quite satisfactory agreement 
with the exponent observed for the actual distribution of languages on Earth. 
For sake of completeness, we also exhibit in Figure 2 the observed values (*) 
of diversity versus area obtained in reference ^H] for all languages spoken 
on Earth (the ten data points are associated with the interval from A = 50 
km^ to A = 10^ km^ of the actual distribution). We notice in passing that 
although there is not a perfect scaling relationship between diversity and 
area along five decades in area, both the simulation and the actual data of 
D{A) curiously seem to be modulated by a similar tendency to oscillate in 
respect to the main scaling behavior (the deviations from perfect scaling in 
the actual data have no connection with the choice of the bins). We have also 
investigated the situation at which the saturation value is the same for all 
languages. We have noticed a linear growth of the diversity with area when 
the maximum 7 is very small. For large values of 7 we notice the existence of 
two scaling regions. For very large values of 7 we recover the result obtained 
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for the case where the fitness are not hmited IJ. 

Figure 3 displays the number of languages with population size greater 
than A^, n{> N), as a function of A^. In order to obtain the curves, we have 
assumed that the population in a given site is proportional to the capability 
in the site. We have considered that the population in a given site is its 
capability multiplied by a factor 100. In the plot, the values of the parameters 
are L = 500 and a = 0.3. In close analogy with the distribution of languages 
on Earth 18 , we find two distinct scaling regimes n{> N) ~ A^~^: r = 
0.35 ± 0.01 for 200 < iV < 2, 000, 000, and r = 1.14 ± 0.01 for 2, 000, 000 < 
< 10, 000, 000. The inset exhibits the differential distribution of languages 
spoken by a population of size A^, n{N). This distribution also agrees with the 
one observed for languages on Earth [THl 0]: in particular it is well described 
by the lognormal function n{N) = exp [— ^^(logA^ — /i)^], with a = 

0.41 and /i = 0.42 (continuous curve in the inset). 

4 Conclusions 

We have introduced a model for evolution of linguistic diversity that con- 
siders a bounded fitness value for languages. We have considered a random 
chosen value of saturation of the fitness for each language in order to mimic 
the fact that different languages have different conditions to propagate. We 
have noticed a considerable improvement of the results when compared to 
the earlier approach P]. Now, the relationship between diversity and area 
presents just one scaling regime. For a = 0.3 we obtain z = 0.39 ± 0.01, 
which is in very good agreement with the exponent observed for the lan- 
guages on the Earth ^S], along five decades of variability in area. We have 
also observed that the exponents r for the two power law regimes in n(> A^) 
as a function of A^ are closer to those obtained by empirical observations pH] • 
In order to compare other kinds of saturation conditions, we have also 
studied the case where the saturation values are the same for all the lan- 
guages. With this condition, we could not reproduce the basic relationship 
between diversity and area observed for the actual distribution of languages, 
although for the very particular and unrealistic case where a = 0.01 and 
7 = 1, we can perfectly reproduce the differential distribution of languages 
spoken by a population of size A^, n{N), as well as the number of languages 
with population size greater than A^, n(> A^), as a function of A^. Our re- 
sults seem to demonstrate that different assumptions on the behavior of the 
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Figure 2: Number of languages D as a function of the area A for a = 0.3. 
The exponent is 2; = 0.39±0.01. The asterisks represent data from the actual 
distribution of languages on Earth. See text and Figure 1 of reference jTH] 
for detail. 
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Figure 3: Main plot - number of languages with population greater than 
N, n{> iV), as a function of N. n{> N) ~ N'^ with r = 0.35 ± 0.01 for 
200 < TV < 2, 000, 000 and r = 1.14 ± 0.01 for 2, 000, 000 < AT < 10, 000, 000. 
Inset - corresponding differential distribution n{N) with lognormal best fit 
(continuous line). See text for detail. 
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fitness function liave very important consequences on tlie cliaracteristics of 
the language spreading. 
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